perm filename ORAL[TLK,DBL]1 blob
sn#216659 filedate 1976-05-22 generic text, type C, neo UTF8
COMMENT ⊗ VALID 00026 PAGES
C REC PAGE DESCRIPTION
C00001 00001
C00003 00002 .!XGPCOMMANDS←"/TMAR=50/PMAR=2100/BMAR=50"
C00005 00003 5↓_Discovery in Mathematics_↓*
C00008 00004 .S(Introduction)
C00011 00005 .S(Discovering the Primes)
C00018 00006 .S(We Did a Problem-Reduction)
C00023 00007 .S(This General Sequence of Reductions)
C00031 00008 .S(Introduce Heuristic Guidance)
C00035 00009 .S(We Write a Theory Formation Pgm)
C00037 00010 .S(Mathematical Theory Formation)
C00042 00011 .S(Writing AM)
C00053 00012 .S(What AM Does)
C00062 00013 .S(Action 1: Fill in entries of facets)
C00065 00014 .S(Action 2: Creating new concepts)
C00067 00015 .S(Action 3: Proposing a new job)
C00078 00016 .S(Locating Relevant Heuristics)
C00088 00017 .S(Cardinality Example)
C00091 00018 .S(Finding Examples of Equality)
C00098 00019 .S(Generalizing Equality)
C00101 00020 .S(Example 2)
C00106 00021 .S(Results)
C00114 00022 .S(Experiments With AM)
C00121 00023 .S(Maximally-Divisible Numbers)
C00127 00024 .S(Conclusions)
C00130 00025 .S(Supplement)
C00135 00026 .S(Questions and Answers)
C00142 ENDMK
C⊗;
.!XGPCOMMANDS←"/TMAR=50/PMAR=2100/BMAR=50";
.DEVICE XGP
.FONT 1 "BDR40"
.FONT 2 "NGR40"
.FONT 3 "FIX25"
.FONT 4 "BDI40"
.FONT 5 "BDR66"
.FONT 6 "NGR25"
.FONT 7 "NGR20"
.FONT 8 "SUP"
.FONT 9 "SUB"
.TURN ON "↑α↓_π[]{"
.TURN ON "⊗" FOR "%"
.TURN ON "@" FOR "%"
.PAGE FRAME 41 HIGH 83 WIDE
.AREA TEXT LINES 1 TO 39
.AREA FOOTING LINE 41
.COUNT PAGE PRINTING "1"
.TABBREAK
.ODDLEFTBORDER←EVENLEFTBORDER←1000
.AT "<<" ENTRY ">" ⊂
⊗4<Still to do: ENTRY⊗*
. ⊃
.MACRO B ⊂ BEGIN VERBATIM GROUP ⊃
.MACRO E ⊂ APART END ⊃
.MACRO FAS ⊂ FILL ADJUST SINGLE SPACE PREFACE 1 ⊃
.MACRO S(ENTRY) ⊂ FAS SKIP 2
.IF LINES≤10 THEN START SKIP TO COLUMN 1 END;
.ONCE CENTER SELECT 5
*** ENTRY ***
. ⊃
.FAS
.COMPACT
.SELECT 1
.PORTION THESIS
.PAGE←-1
.NEXT PAGE
.INDENT 0
.TURN OFF "{∞→}"
.BEGIN CENTER
⊗5↓_Discovery in Mathematics_↓⊗*
⊗5↓_as Heuristic Search_↓⊗*
Ph.D. Oral Exam
Wednesday, May 19, 1976
⊗2Douglas B. Lenat⊗*
Artificial Intelligence Lab
Computer Science Department
Stanford University
.END
A program, called "AM", is described which models one aspect of
elementary mathematical research: developing new concepts under the
guidance of a large body of heuristic rules.
The local heuristics communicate via an agenda mechanism, a global
list of tasks for the system to perform and reasons why each task is
plausible. A single task might direct AM to define a new concept, or
to explore some facet of an existing concept, or to examine some
empirical data for regularities, etc. Repeatedly, the program
selects from the agenda the task having the best supporting reasons,
and then executes it.
Each concept is an active, structured knowledge module. A hundred
very incomplete modules are initially provided, each one
corresponding to an elementary set-theoretic concept (e.g., union).
This provides a definite but immense "space" which AM begins to
explore. AM extends its knowledge base, ultimately rediscovering
hundreds of common concepts (e.g., numbers) and theorems (e.g.,
unique factorization).
This approach to plausible inference contains some unexpected powers
and limitations.
.SKIP TO COLUMN 1
.EVERY FOOTING(,--- {PAGE} ---,)
.S(Introduction)
This talk
will fall naturally into 3 separate parts.
The last two of these will deal specifcally with one
heuristic program, AM, which
emulates some of the activities involved in doing mathematics
research: namely, AM investigates a given collection of objects and
operators, and gradually enlarges it by defining new concepts. All
this is accomplished under the guidance of a large body of heuristic
rules.
But before discussing AM, let's spend a few minutes thinking about how
scientific discoveries are made; at least, about how they're made in
very elementary mathematics.
We'll gather a few hints from that analysis, and use them to propose
a simple ⊗4model⊗* of how mathematical theories are developed.
At that time we can turn to AM, and view it as a computer program
embodying that model.
The first thing I want to stress is the difference between solving a
given problem and thinking it up in the first place. Contrast
playing monopoly, to inventing new games of the same quality.
Or
contrast solving the missionaries and cannibals problem, against the
ill-defined reasoning which led to formulating it.
Turning to mathematics, If I show you the definition of prime
numbers, <SLIDE primes> you can find some for me without any trouble.
Whoever first formulated that concept performed a much harder task.
.S(Discovering the Primes)
How did he do it? How can we rationally account for the discovery of
prime numbers? We'd like to propose a plausible scenario in which
someone decides to look at these numbers, and then decides they're
interesting and worth naming.
Imagine that a bright undergraduate math major has been given some
formal system of objects and operators, which he doesn't realize is
isomorphic to set, set-operations, numbers, counting, and arithmetic.
He's told to explore that formal system, finding out some new
relationships between those concepts which were given to him, and
encouraged to define some new concepts if they appear useful or
interesting to the student/researcher.
In that context, let me suggest to you one plausible scenario of the
discovery of prime numbers.
Suppose the researcher has just been playing with the notion of
"divisors-of" of a number. One rule of thumb he sometimes follows
when devising new theories is to consider extreme cases. <SLIDE goto
extremes> In more precise terms, this rule says that if you've just
been studying an interesting function f which takes elements of set A
into those of set B, and there is some extremal subset of B, it's
worth taking the time to consider which elements from A map into that
distinguished subset.
In the current situation, the function f is "Divisors-of". For
example, <SLIDE: Heur overlay> it takes the number 12 into the set of
numbers which divide evenly into 12, namely {1,2,3,4,6,12}. So f maps
a number into a set of numbers. An interesting or extreme subset of
B would thus be some extreme kind of set. We have names for
extremely small sets, for example: empty sets, singletons,
doubletons, tripletons. The rule of thumb thus says it's worth
considering the set of numbers over here which map into empty sets.
So it's worth isolating those numbers which map into singletons; the
set of numbers whose set of divisors is a doubleton; numbers whose
image under the function divisors-of is a tripleton.
In other words, this rule has suggested defining and studying the
following classes of numbers: the set of numbers with no divisors,
with only 1 divisor, with 2 divisors <SLIDE n divisors>, and with
three divisors. A priori, any or all of these sets could be
interesting. It will turn out that ⊗4this⊗* set (2 divisors) is the
most interesting and useful one. Why?
Just the definitions don't help. The researcher might take the
trouble to examine all the numbers less than a thousand, let's say,
and place them in the appropriate set, depending on how many divisors
they had.
Well, these two sets of numbers are very small and there's nothing
too interesting about them. Now we turn our attention to these two
sets of numbers... Nothing obviously interesting about these (2
divisors), but these (3 divisors) seem to all be perfect squares. A
natural question to ask is: what are square-roots of these numbers?
We then notice that all the members here just seem to be the squares
of the members from this set. This may cause us to spend some more
time studying both these sets.
After examining some more empirical data, the researcher may notice
that all numbers seem to factor uniquly into numbers drawn from this
set. This is far from trivial, and we'll go into how he does this
later on. He'll conjecture the unique factorization theorem <SLIDE
UFT> and notice that it can be made much less awkward if we give this
set a name, <SLIDE: Primes overlay> say PRIME NUMBERS.
Once he's got the definition of primes, and he's got a couple good
reasons for giving them a name, theorems involving primes, we shall
say that he's actually ⊗4discovered⊗* them.
.S(We Did a Problem-Reduction)
We haven't completely accounted for the discovery of prime numbers,
we've just reduced our problem from "how in the world could somebody
invent the notion of prime numbers" to the somewhat simpler task of
"how in the world could somebody invent the notion of divisors-of".
<SLIDE Reduc 1> The reduction was accomplished by citing a heuristic,
which said:
⊗4"Investigate the inverse image of unusual or extreme subsets of the
range of some operation."⊗*
In this case, the extreme subset was "doubletons", and the operation
was divisors-of. The inverse image of doubletons turned out to be
the set of numbers we know as the primes.
"Looking for extreme cases" is a pretty general heuristic.
It's not a rule of inference, it's just a rule of thumb; it doesn't
guarantee anything, in the sense that modus ponens guarantees to
preserve validity, or in the sense that Waltz's constraints on
shadowing line drawings were absolute.
This heuristic rule certainly isn't applicable or useful all the
time. And yet it's worth remembering and using because, in my
experience, it frequently leads to interesting, valuable new concepts
to investigate. In the long run, using this heuristic is
cost-effective.
A natural and fair question, now, is "OK, so you've rationalized the
discovery of prime numbers, but why was the researcher investigating
the concept `divisors-of' to begin with?". Have we really gotten
anywhere? If "divisors-of" was one of the primitive conepts we gave
that researcher to begin with, then the answer is Yes, we have in
fact explained how he might have discovered primes. But assume that
we didn't give him divisors-of, just simpler arithmetic operations.
Maybe we use the same technique to explain why the researcher
proposed this concept, divisors-of. We'd like to grab some rule of
thumb and use it to rationalize that discovery.
But look at the definition of divisors-of(n) <SLIDE: Defn of
Divisors-of>. It's very closely related to the ⊗4inverse⊗* of the
function Times. A plausible heuristic tactic <SLIDE: Look at
inverse> is to investigate the inverse of each interesting operation.
So that rule would account for this discovery, reducing it to the
discovery of the concept of multiplication, because looking at the
inverse of multiplication would lead to defining and studying this
concept, divisors-of. Actually, one also has to notice that
multiplication is associative and commutative, i.e, it can be viewed
as taking a ⊗4bag⊗* or multiset of numbers as its argument.
If you can stand it, we might try to continue this reduction even
further. How could we rationally explain how ⊗4this⊗* concept might
have been discovered. You might consider discovering multiplication
as a numeric analog to the operation cross-product, or as repeated
addition. Addition could have been found as the numeric analogue of
union, or as repeated successoring. Successor is the numeric
analogue of CONS or insertion.
.SELECT 1
.S(This General Sequence of Reductions)
<chain SLIDE> We've performed a sequence of reductions, <Overlay:
arrows down> from primes to divisors-of to multiplication to addition
to counting to set-insertion. Each step was justified by some rule
of thumb: Here it is "consider extremals", here it is "consider the
inverse of an interesting relation", here and here it is "repeat an
interesting operation".
We could continue this analysis, but at some point the concepts we'd
have to deal with get so elementary that we feel it makes no sense to
talk about "discovering" them. At some point, they are the primitive
concepts we initially provided the college student researcher with.
I believe that when we hear of a new discovery, we mentally try to
analyze it in this fashion ourselves. We try to perceive a chain of
concepts, stretching from that ⊗4latest⊗* discovery backwards, not
all the way to our conceptual primitives, but at least all the way to
concepts we already are familiar with. If we succeed right away, we
probably think that the discovery wasn't so hard, that we could have
done it ourselves.
<Remove down-arrow overlay> Very often, a new discovery is really
just a couple steps in this kind of process, one or two heuristic
applications away from what is already known. That is why it's often
easy for us to believe that there wasn't really so much effort needed
to make the discovery. If we already know this concept, and this
heuristic, it almost seems obvious once we've heard that someone has
taken this step.
Why then don't we make new discoveries every few minutes? Why are
even some 1-step advances worth publishing? The answer can be found
by seeing how (according to this model) very simple new discoveries
are made.
We're talking now about reversing the flow of time here. Instead of
analyzing a discovery by breaking it into simpler and simpler ones,
we're talking about the synthesis of new concepts, the activity which
the researcher was carrying out all along. He was continually trying
to grow new nodes on this tree, using whatever heuristic rules he's
learned over the years.
The researcher has a bunch of concepts that he knows about, say a few
thousand. I won't even consider his ⊗4legal⊗* choices, since they
are quite astronomical in number. Say he knows a few hundred
heuristics he can apply. In any situation, then, he may literally
have a million alternatives to consider, <OVERLAY of red lines>.
These are not his ⊗4legal⊗* choices, but his ⊗4plausible⊗* ones; each
and every one is justified by some heuristic, like this link is.
But nature is not kind to us, and only a very small percentage of
those new concepts will turn out to be interesting. What is even
worse, he can't whiz through this space, stopping at the interesting
concepts, because it may require hours -- sometimes even years -- of
research to decide whether the concept he's studying is worthwhile or
a dead-end.
Even here, where we have just a few concepts and heuristics, the tree
gets bushy when we try to go in this direction. We could apply
"looking at inverse" here, (cross-product) to decide to investigate
the kind of operation we normally call projection. Or here
(divisors-of) even if we decide to look for extremals, which way do
we go? Why not examine numbers which have very many, rather than
very few, divisors? Why not keep creating new operations, by
repeating the last one created, thereby discovering exponentiation,
hyper-exponentiation, and so on?
You may give me a special-case rebuttal for each of those
suggestions, but I think that rebuttal will be based on hindsight.
The analysis procedure was a search for a path from a given node back
to primitives, but the synthesis procedure is blind, doesn't have a
well-specified goal, so it is combinatorially more explosive.
So it's harder to make a valuable new discovery than to rationalize
how someone might plausibly have discovered something he's just told
you about. This explains why discoveries sometimes seem so obvious
⊗4after the fact⊗*, even to their inventor who may have labored years
on the problem.
The same let-down occurs when a magician shows you how he manages
some magic trick, or equivalently when an AI researcher like myself
shows you how his program was really able to manage some amazing
behavior.
.S(Introduce Heuristic Guidance)
Now it's clear that this synthesis task, exploring outward to find
new valuable concepts, is a very explosive search process.
The standard Artificial Intelligence method for muffling an explosion
of red arrows is to introduce purple and green heuristic guidance.
<2nd OVERLAY: green/purple>: purple strategies highlight the most
promising nodes to work on at a given moment, and ⊗4wiggly green
ideas suggest furiously⊗* one or two operators to try first on a
given node.
Recall that each red operator is a rule of thumb. So the green
strategies tell us, in a given situation what the best heuristics are
to apply.
,From now on, when I use the term "heuristic rule", I'll really mean
a two-sided creature like this one, <SLIDE: heurs.; a
situation/action rule. It says that in some empirical situation, here
are some plausible things to spend your time doing.
Here, the rule's IF-part, or left-hand-side, was asking whether two
concepts have some nontrivial yet unexpected overlap. If so, the rule
concludes that it's worth the trouble to isolate and explicitly study
that overlap.
.S(We Write a Theory Formation Pgm)
The basic idea to remember from all this is that simple mathematical
discovery might be successfully represented as a heuristic search
procedure: <SLIDE: 3 tasks> a continual expansion of a tree of
concepts and relationships, guided by some judgmental criteria, some
informal rules of thumb.
To test this hypothesis, I wrote a computer program, called A.M.,
which proposes new concepts in simple mathematical domains. This
program goes through the same kinds of concept exploration and
development that the hypothetical undergraduate researcher did.
AM is to be heuristic search program, and in the tradition of AI
programs of this form, there are three separate things I should tell
you about: the same things you always have to do to implement a
heuristic search:
.BEGIN INDENT 0,3,0 PREFACE 0
(1) specify the initial core of knowledge that it will start with,
(2) specify the legal ways of adding to that knowledge
(3) collect the strategies and tactics that experts in that field use
when trying to form theories.
.END
.S(Mathematical Theory Formation)
If anyone asks at the end, I'll discuss why mathematics is an ideal
field in which to try out these ideas. For now, assume that we've
settled on very elementary math. Before carrying out these 3 steps,
I should at least mention what I mean by a ⊗4mathematical concept⊗*
and a ⊗4theory⊗*.
A mathematical theory <SLIDE: thy> is built on a ↓_⊗2FOUNDATION⊗*_↓
of primitive, undefined ↓_objects_↓, and ↓_operators_↓, plus some
postulates and axioms which interrelate them. Each of these
qualifies to be called a mathematical ↓_concept_↓.
Onto this foundation is built a tower of theorems -- statements which
are implied by the foundation -- plus some definitions of new
concepts in terms of preexisting ones.
This is the final, breath-taking architecture that we see in math
textbooks. The foundation is laid carefully, and the edifice is
built smoothly, brick by brick, with no mistakes, no backtracking.
OPTIONAL:
.ONCE INDENT 5,5,5
<math textbook-order slide> All undefined objects and axioms are
stated, and then an infinite define-state-prove loop is entered.
Necessary lemmata are stated and proved before the theorems which use
them. Occasionally, this process is broken by some post-facto
motivation or real-world application. This is entire process never
involves backtracking.
What they ⊗4don't⊗* show you in textbooks and journals is the
unaesthetic scaffolding that was required to erect that tower: that
scaffolding is empirical induction and good old-fasioned search, with
guessing, back-tracking and everything.
At the very elementary levels, at least, mathematics is an empirical
science. The AM program actually proceeds in almost the opposite
order from textbooks, as it develops new relationships and defines
new concepts.
Most of the mathematician's empirical forays are fruitless, but there
is no way to know this before carrying them out. His only guidance
are his intuitions about what to investigate next, his purple and
green heuristic strategies.
This same character of searching a large space also carrie over to
AM.
Let me just pause a moment and mention that AM is writtn in
INTERLISP, it is about 150k big, it lives at SUMEX PDP-10 KI-10. It
is a running system, and has managed to discover all the concepts and
theorems which I have mentioned so far, and the ones I will mention.
<3tasks, revisited SLIDE> Recall that there is a standard set of
tasks to perform, to create such a system.
.SELECT 1
.S(Writing AM)
The first step was to list all the concepts the system would know
about initially. That is, a bunch of primitive notions which are the
starting state of the system's knowledge.
<SLIDE concept> Here are the hundred or so concepts that AM started with.
For reasons I'll go into if anyone asks, I chose a small set of concepts
which any mathematically-sophisticated four-year-old would have,
concepts which Piaget might have called
⊗4pre-numerical⊗*.
Included here are static structural concepts like sets and
ordered-pairs, and active ones like union, reverse,
compose-2-relations, repeat, and so on.
Note that there is no
notion of numbers or proof.
At a slightly deeper level, we have to decide precisely what
information the system will know about each of these concepts.
Certainly we should provide a ↓_definition_↓ for each concept, and a name.
For each operation, we should also mention its ↓_domain and range_↓, and
give an algorithm for computing it.
What else should be
supplied initially about each concept?
To help AM decide what to do, it would be nice to supply some kind of
rough judgment of the worth of each concept: its usefulness, its
simplicity, symmetry, etc. Instead of grappling with the metaphysics
or this problem, I simply attached, ad-hoc, a number to each concept,
which was supposed to represent its overall worth.
If we know of any heuristic rules relavant to dealing with that type of concept,
they might be tacked onto it.
So there are many facets to each concept, <SLIDE facets>. Some of the
ones we didn't mention yet were:
Examples, Generalizations, Specializations,
and Analogies.
This set of facets is fixed once and for all. Each concept can
possess any or all of these facets. But that's all we can ever know
about a concept. A concept, as represented inside AM, ⊗4is⊗* simply
a bunch of facets, drawn from this fixed set.
In the LISP program implementing these ideas, each concept is stored
as an atom, and its facets are implemented as the properties on its
property list.
Let's consider one particular concept, and see what kinds of
information could go into each facet. We'll consider the concept
COMPOSE, which refers to the composition of two relations. We define
AoB(x) as A(B(x)). That is, apply B and then apply A to the result.
<COMPOSE slide> <go over each facet>. Here is an abbreviated
description of that concept.
Several equivalent definitions are provided. Some are computationally
efficient, and others are slow but especially easy to for the system
to analyze. Each definition is an executable LISP predicate, which
takes 3 arguments and tries to see whether the composition of the
first two is equal to the third.
The Domain/range facet notes that Compose accepts a pair of activities,
and creates a new one.
The specializations facet points to another concept which is similar
to Compose. This specialization only composes an activity with
itself.
An ⊗4example⊗* of Composition is provided here.
One conjecture that AM found and stored here is that composition is
associative.
Down here we have some heuristics for dealing with compositions. An
overall rating of this concept, how to spot an
interesting compostion (e.g., if the d-r of the composition are
equal), hints for filling in certain facets of any new composition
which gets created, and something worth checking about each
composition.
The format of each facet is fixed, but the formats vary from facet to
facet, from productions <heurs> to little programs <algs>, to bits of
declarative data <d-r>.
Let me put back the list of concepts again, and also our list of
facets.<SLIDE>
I provided most of the contents of these facets by hand, for each of the
initial 100 concepts. AM later filled out the other facets.
Whenever AM created a new concept, it had of course no
"hand-supplied" facets at all for it.
.S(What AM Does)
What does the system actually do, now? From a heuristic search
viewpoint, the primary activity of the system is to create new
concepts. These are the highly-visible "legal moves": taking existing
concepts and combining them, conjoning them, disjoining them, running
them, and modifying their definitions. This plethora of alternatives
is too big to consider, so AM only creates a new concept when some
heuristic rule indicates that it might be worth the gamble. So the
heuristics act like little plausible move generators.
When AM does create a new concept, most of its facets will be blank,
so that in practice most of the system's time is spent in fleshing
out some recently-discovered concept, filling in some of its facets.
In other words, there are two ways that AM expands its knowledge. One
way is to pick an already-known concept <SLIDE: Compose> and try to
find out more about it, by filling in some facet of it <SLIDE:
Compose.exs overlay>. The second way AM expands its knowledge is by
defining new concepts out of old ones <SLIDE: ∪o' overlay>.
In the LISP implementation, that translates to (1) filling out the
properties of existing data stuctures (here some examples of
Compositions have been found), and sometimes (2) creating entirely
new data structures (Here the system takes one particular example of
Compose and promotes it from being just one entry on one facet of one
concept, into being a full-fledged concept module with facets of its
own).
What happens when a new concept is created? Well, at the moment of
its birth, AM knows its definition, and something about its worth.
Also how it connects to some existing concepts. So a few seconds are
spent filling in all the facets whose contents are obvious and which
would be much harder to fill in later on. Once it's created, most of
its facets will still be empty, so there will be many new things for
the system to do.
Since the "fleshing out" activity is quite expensive, it would be a
mistake to just try to fill out all the facets of all the concepts.
For example, AM starts off with 115 concepts. Of the 2 dozen
possible facets they might possess, only a few are filled-in for each
concept; there are 2000 blank facets initially. If would take days
of cpu time for them all to get filled in. And each time a new
concept was proposed, an hour might be shot working just to flesh it
out completely. AM must manage its time more carefully.
There must be some "fine structure" to the process of growing a new
node onto the tree of concepts. This is a non-standard problem, not
faced during your typical heuristic search. We can't permit the
system to completely fill out each new node it wants to look at.
The solution I use is to have AM maintain a list of very plausible
tasks, and repeatedly choose the most promising of them to execute.
Each task is at the level of fillng out a particular blank facet of a
particular concept. So a few facets of some concept will get filled
out, and the situation at that moment might then cause AM to shift to
filling in some facets of an entirely different concept. When a new
concept gets created, most of its facets may remain blank for a long
period of time.
The LISP implementation of this idea is to have a global job-list,
<SLIDE joblist> an agenda of suggested activities to do, an ordered
list of candidates for the system's attention. Each entry consists
of a specific task for the system to work on, a priority value, and a
list of reasons why this is a reasonable task to do. Except for this
list of symbolic reasons, this is the same as the agenda mechanism
proposed by Winograd, Bobrow, and Kaplan for their new language, KRL.
At some moment, there might be hundreds of entries on the agenda;
this one indicates that the system spend some time filling in example
of the connept Primes.
This agenda governs the top-level control structure of the program.
The system picks the highest priority job off this list, and executes
it.
The priority rating is a function of the values of these reasons.
They in turn are proposed locally, by the heuristic rule which
suggested this job in the first place. We'll see this in more detail
in a minute.
In the process of performing a job, 3 kinds of things can happen:
.BEGIN PREFACE 0 INDENT 3
(1) blank facets may get filled in. In the case of this task, the
Examples facet of the Primes concept would hopefully get filled in.
(2) new concepts may get created. Up here, when this task is
executed, the resultant gnneralizations will be full-fledged concepts
themselves.
(3) new tasks may get added to the agenda. After filling in examples
of primes, they might be so rare that the system would add a task of
the form "Generalize the cocnept of Prime numbers".
.END
<SLIDE: 3 actions> Each of these activities is guided by a heuristic
rule. I'd like to take a minute and illustrate how a heuristic rule
can manage each of these kinds of actions. The basic idea is that a
heuristic rule triggers, because its left side is relevatn to the
current state of the world. The right side of the rule contains a
list of actions to perform, and those actions are of these three
types.
.S(Action 1: Fill in entries of facets)
The first kind of action that a heuristic rule can direct is the
filling in of specific pieces of information about a particular
concept.
For example, here's a rule <SLIDE: fill exs> which gives one strategy
for filling in examples of any concept C. It says to find an
operation whose range is C, then find examples of the domain of the
operation, and simply run that operation and gather up all the values
it returns. Those values should then be examples of C.
Here is how it might be used to fill in some examples for Sets,
assuming a few were already known. It would say to find an operation
whose range is Sets. One such known operation is Union. Then it says
to run that operation, and the results will be examples of Sets. Sure
enough, if we do that, we do produce lots of sets. Some of them will
be sets which weren't known before.
A similar kind of heuristic rule is used to find conjectures: Here is
↓_a rule <SLIDE: find conjec> which has AM_↓ ask if a known
generalization of concept X is really equivalent to it. If so, this
gets recorded, as an entry on the Conjecs facet of the concepts
involved. This is the only way that AM noticed any conjectures: when
specifically directed to do so by some heuristic rule or other.
.S(Action 2: Creating new concepts)
<SLIDE: remove overlay from 3 actions> We've seen how heuristics can
suggest new jobs to add to the agenda. How can they cause new
concepts to be created?
Consider this heuristic: <SLIDE: Create-con>. It says to look for the
inverse image of extreme elements under an interesting operation.
We already saw how this heuristic can lead to proposing the
definition of prime numbers, when the operation F is divisors-of.
The heuristic contains enough information to define the new concept,
and to fill in any other information which is trivial now but might
be very hard to reconstruct later on (like why the concept was
created).
.S(Action 3: Proposing a new job)
<SLIDE: remove overlay from 3 actions> The third kind of action that
a heuristic rule can initiate is to suggest a new task which then
gets added to the agenda. Let's see how this happens.
A heuristic rule, like this one <SLIDE: New-job> will trigger because
its left-hand side is satisfied. Somewhere among the actions to
take, listed here on its right side, is one which explicitly calls
for a new task to be added to the agenda. The rule says that if a
predicate appears to be rarely satisfied, Then the following task is
plausible: fill in some generalized forms of that predicate. Notice
that the rule also specifies a English phrase which is the ⊗4reason⊗*
why the task is now plausible, and also supplies a way to rate the
overall worth of that reason.
The precise numbers turn out not to be important.
.SELECT 1
Let's see how this particular rule might be used to suggest a new
task for the agenda. Say AM recently worked on finding things which
were Equal to each other, but that very few random pairs of objects
turned out to be equal. Then this rule might trigger, and would
propose this task: <SLIDE: task1>. If this task wasn't already on
the agenda, it would get inserted there.
But suppose ⊗4this⊗* <SLIDE: partial agenda> was the state of the
agenda at the time. We see riht here that the task is already on the
agenda, but for a different reason. In such a case, the new reason
would be recorded, and the priority value of the task would get
boosted. <Full agenda> Here is the way that task would look. The new
reason has raised it to the top of the agenda.
This turned out to be a surprisingly important mechanism. When a
newly-proposed job turns out to be ⊗4already⊗* on the agenda, then
its reasons are examined. If the job is being proposed for pretty
much the same reasons as before, then its priority rating is
incremented only slightly. But if it is being proposed for a new
reason, as in this case, then its priority is increased tremendously.
This is the practical importance of maintaining reasons for each job.
AM can compare two reasons to see if they are equal, and that's all
it has to do manage this scheme. Other than that, there is very
little that AM can do to a reason, except type it out to the user.
Each reason is actually just a token, but as you can imagine they're
quite useful for keeping the human user aware of what the system is
doing and why.
In general, the priorty value of a task on the agenda depends on the
ratings of all the supporting reasons. The reasons are rated
locally, by the same heuristic rules which proposed them. A single
⊗4global⊗* formula then looks at the job and the reasons, and assigns
it a single priority rating from 0 to 1000. <SLIDE: Formula> That
formula was just guessed at as a first pass, and it's worked well
enough not to tamper with it. It involves squareroots and
dot-products of vectors and isn't really worth delving into.
Each concept, each facet, each reason supporting a job on the agenda
has a number attached to it. This formula uses those numbers to
attach an interestingness factor, a priority rating, a number, to
each job on the agenda.
<SLIDE: Put back full agenda> AM looks over the job-list, picks the
job with the highest priority, and tries to execute it. The flow of
control is so distributed that it's almost unpredictable.
That priority rating of the chosen task then dictates how to manage
the system's resources: it indicates the number of cpu seconds to
spend before giving up and trying the next task on the agenda, and it
specifies the number of list cells that the task is permitted to use
up before it must quit, and so on. (This idea was inspired by Carl
Hewitt's notion of activation energy.)
We are assuming that there is a self-consistent set of computational
rules for manipulating the estimated worth of all the concepts and
jobs. This is at best a first-order theory of interestingness.
Comparing the worths of "equality" and "primes" is really comparing
apples to oranges.
I think that this scheme works adequately here only because neither
the heuristics nor the jobs are strongly coupled; they are all pretty
much independent of each other.
For example, it doesn't really matter which order the top few jobs
get executed in; no one cares whether we look for primes right before
generalizing equality, or right afterwards. What ⊗4does⊗* matter is
that they are near the top, not near the bottom, of the agenda.
That's why a crude kind of evaluation function is all you need.
.SELECT 1
.S(Locating Relevant Heuristics)
<SLIDE: where we are> We've seen 3 kinds of actions a heuristic rule
can take. Furthermore, we've seen how the agenda mechanism always
keeps AM working on a plausible task. But we're not really in
business until I explain one more thing. Once a task is chosen, how
are the right heuristic rules located, the ones which will be
relevant to the task just chosen?
There are about 250 heuristic rules altogether in AM; that's too many
to simply evaluate the left sides of ⊗4all⊗* of them.
After explaining this, the overall behavior of AM will be described.
Like a human mathematician, AM should only access a strategy in those
situations where it could plausibly be used. By looking at the
heuristics, it seemed that each one has a very clear domain of
applicability, of relevance.
For example, <SLIDE: compose heur> one heuristic says that a
composition FoG is interesting if it preserves some of the unusual
properties of F and G. This is a useful thing to know when trying to
judge the interestingness of a composition, but it won't help you if
the current task is to check examples of Lists. We may as well
associate this heuristic with the concept Compose. We'll record this
heuristic rule inside the Interest facet of the Compose concpet.
That's what this orange line means.
Other heuristics are a little bit more general, they relate to any
operation <SLIDE: Operation heurs overlay>, so they would be tacked
onto the concept Operation. This one is an entry on the Fillin facet
there. This one is on the Interest facet of the concept Operation.
It says that any operation is interesting if its domain and range are
identical.
As AM was written, I gathered heuristics by introspection, and tacked
them onto the concept I thought they were most relevant to. <SLIDE:
Genl branch> The more general the heuristic rule, the weaker it is,
typically. So each individual heuristic rule is tacked onto the most
general kind of entity to which it applies. Whenever I thought of a
heuristic, I pushed it as high as possible in this tree of concepts.
When dealing with any concept X, all the heuristics attached to any
generalization of X, that is, any higher node, are also considered to
be relevant to X.
Say AM wants to see how interesting this particular concept is. Then
AM could use evaluation heuristics gathered from any of these
concepts. Any reason why an operation might be interesting will also
be valid here.
The system would ripple upward, in the direction of increasing
generality, collecting heuristics as it went. They would be
evaluated, and their weighted average result would be taken as the
estimated inteestingness of this concept.
In fact, <OVERLAY> ⊗4these⊗* features are satisfied by this
operation, so they combine to indicate it is a mildly promising
concept. Notice that the further upward we go, the less specific the
heuristics get, and the less useful they are.
If AM wanted to fill in generalizations of this concept, it would
ripple upwards again, gathering heuristics for filling in
generalizations. Any technique for generalizing an operation would
certainly work here too.
The same kind of rippling would collect relevant heuristics if AM
wanted to fill in examples of a certain concept, or check the entries
of the domain/range facet of this concept.
Since the heuristics attached to these (specific) concepts are
typically more powerful than the heuristics way up here, AM always
executes the heuristics in the order they are collected. No
sophisticated reasoning is done to order them in some way prior to
obeying them.
This works [in practice] because the heuristics rarely couple; at
most, they superimpose thier effects; in either case the ⊗4order⊗* in
which they are executed is not too important. They don't interact
much. Simple schemes like this only work when the components of the
system are very independent of each other, when the problem was
easily decomposable or factorable to begin with. Tight
interdependence would require a more sophisticated approach.
That's one reason AM couldn't progress indefinitely far in
mathematics, at least AM as it now stands.
.S(Cardinality Example)
Let's finally take a look at the program in action. <SLIDE:
Cardinality slide>
.SELECT 1
This is a condensed excerpt from a session with AM, as it first
discovers concept of numbers. AM already knew about sets, bags,
lists, etc., and equality of 2 sets. A bag is a multiset, an
unordered structure with repeated elements allowed.
I'll read through it once to show you that AM really performs magic,
then we'll peek behind the curtains to see that AM is really no more
than the straight-forward data-structure expander that I've been
describing.
(1) After testing random structures to see if they are equal, AM
concludes that it would be worthwhile to generalize the predicate
EQUAL. The user asks "why" and AM mentions that few randomly selected
objects turned out to be equal to each other.
(2) Two generalizations of EQUAL are constructed. The first turns
out to be the relationship "has the same length as", and the second
is "has the same first element as".
(3) After testing random structures to see if they have the
SAME-LENGTH, AM decides to try to find a canonical form for all
structures. This would mean that two structures were of the same
length if and only if their canonical forms were identically equal.
(4) AM comes up with the awkward idea of making this conversion by
(i) converting each structure to a BAG, and (ii) replacing each
element by the constant T. Thus the canonical form of our alphabet
would be a bag 26 T's. AM decides to restrict various operations
like Union and Intersection to such canonical structures, and see
what happens.
The user tells AM that such canonical structures should be called
NUMBERS.
Well, now that you've seen what AM says, let's see how it does it.
.S(Finding Examples of Equality)
Recall that Initially, I supplied by hand a definition or two for
each of the 115 concepts AM started with. Each such definition was a
LISP predicate. I also gave each concept a rough numeric Worth value.
Each concept which was an operation also received an algorithm or
two, and a statement of domain/range. Finally, I introspected and
collected all the general heuristics I could, and tacked them onto
the appropriate concepts. Most of these heuristics applied to the
general concepts, like Operation. Very few applied to the specific
concepts, like Equality or Reverse.
Equality was one of the initial base concepts. It originally had a
very low Worth rating. AM began to notice, e.g., sets all of whose
members were equal, and gradually boosted Equality's Worth rating.
When Equality became valuable enough, a general heuristic rule said
to consider filling in some examples of it. This was added to the
agenda, and eventually AM got around to that task.
How in fact did AM find examples of objects which were EQUAL, once
that job was chosen? The job said to fill in the EXAMPLES facet of
the EQUAL concept. To do that, AM gathered heuristics by rippling
outward <SLIDE: Genls(Equality)> from Equal.
There were no heuristics located here, but some were found on each of
the higher, more general concepts.
One of these (Any-concept) rules of thumb said to instantiate the
definition of the predicate EQUAL. Three definitions of Equal were
supplied by me initially. Here is one recursive definition of EQUAL:
<defn of EQUAL slide>. AM possesses several syntactic rules for
instantiating LISP predicates, which is all that this definition
really is.
An easy way to instantiate this, to satisfy this, would be simply to
satisfy this base step. To do that, the two arguments must be
identical atoms. So some examples like NIL=NIL, T=T, are produced.
Also, AM inverts this recursive step, by plugging in here objects
already known to be equal. But AM just found that NIL=NIL. So if
the CARs of 2 structures are both T, and their CDRs are both NIL,
then the structures will be Equal. So an example like (List T) =
(List T) is generated.
An entirely different approach is suggested by a heuristic tacked
onto the Activity concept. It says that, when filling in examples of
any activity, including a predicate, we can randomly instantiate the
arguments, run the concept, and observe the results. So AM accesses
the domain/range facet of EQUAL, sees there the entry: a pair of
objects gets mapped into True/false. Then AM repeatedly picks random
pairs of objects and sees if they satisfy EQUAL, by running the
little program stored inthe Algorithm facet of Equal. This is where
the empirical data comes from that tells AM how rarely EQUAL is
satisfied.
We already saw the heuristic that said to consider generalizing the
definition of a predicate, if very few examples can be found. So a
new job is added to the agenda, which says to generalize the
definition of EQUAL. Since it has a good reason, it gets a high
priority value, and it soon rises to the top of the job-list.
Note that the techniques for filling in all those examples of
Equality were drawn from very general concepts: ways to fill in any
concept, ways to fill in any active concept. There were no
special-purpose tricks, no specialized heuristics for filling in
examples of Equality.
Similarly, AM used very general techniques for noticing that this
predicate should be generalized, and for generalizing its definition.
AM in fact was given only a few syntactic rules for generalizing a
LISP predicate, and therefore some semantically-uninteresting
generalizations of Equal were created. After a while, AM noticed
more and more relationships involving one of those new concepts, and
its Worth became higher and higher.
.S(Generalizing Equality)
Assume that AM has plucked that job off the agenda, the one which
says to generalize the definition of EQUAL. AM magically turns
Equality into Equal-length.
It does this using the syntactic rules for generalizing the
definition of any concept. One rule says to remove a conjunct, so
that the new predicate would recur in only one direction.
In particular, by doing away with the test in the CAR direction <defn
OVERLAY slide>, we get a predicate which counts down two lists, and
returns T if and only if they both become empty at the same time.
That is, iff they have the same length.
A few other concepts were generated which were boring generalizations
of Equality. There are two reasons why syntactic rules suffice: (1)
AM never generalizes X unless it has a very good reason, so this is a
very rare occurrence; (2) very quickly, AM willnotice interesting
relationships involving some of these, and lose interest in the
other, boring ones.
<SLIDE: User sees it again> The second magic trick was deriving the
canonical form of a structure. AM did this by selective
experimentation, which I'll describe later if anyone asks.
This whole development was done to satisfy just 3 jobs from the
agenda list:
.BEGIN FILL INDENT 4,0,0 PREFACE 0
(Fillin Exs. of Equality)
(Generalize Defn. of Equality)
(Canonicalize Same-length wrt Equality)
.END
This example is really compressed by a factor of 10; it distorts AM's
abilites and limitations. Let's move at a faster pace through
another, longer, but more faithful excerpt of AM in action.
.S(Example 2)
.BEGIN TURN ON "∞→"
After going through this final example, I'll just describe the
overall results of this project.
We'll start in the middle of a sessions with AM, after 64 other tasks
have been chosen and executed. AM has already discovered numbers,
arithmetic, squaring, square-rooting, odd numbers, and several other
elementary objects and operators. The concept of Divisors-of a
number has just been defined.
<SLIDE: Ex2>
Task 65: AM was interested in the concept of Divisors-of, and found
several examples of it.
66: The inverse images of various extremal kinds of sets were
considered. AM noticed that numbers-with-3-divisors all seemed to be
perfect squares. How did it notice that relationship? It took one
example of these numbers, 49. AM then asked what 49 was an example
of, besides Numbers-with-3-divisors. The answer was: number,
odd-number, perfect-square. So one hypothesis was that all such
numbers might be perfect squares. This was tested on the other 10
examples of of Numbers-with-3-diviosrs, and sure enough they all were
squares. So AM conjectured that Nos-with-3-divisors is a
specialization of the concept Perfect-squares.
⊗3↓_∞ → _↓⊗*
67: The square-roots of those numbers turned out to be
numbers-with-2-divisors. Which the user then renamed as "Primes".
The value of both concepts was raised.
⊗3↓_∞ → _↓⊗*
79: Later, AM examined the inverse of Times. For example,
Times↑-↑1(12) is the set of all possible ways of factoring 12. It
contains the bag (2 2 3), since 2x2x3 is 12. AM noticed that
Times↑-↑1 of each number seemed to contain a bag of primes. So a new
concept was created, a new relation which maps a number into all
possible factorizations of that number into primes.
⊗3↓_∞ → _↓⊗*
80: Lo and behold, that relation turned out to be a function. This is
the fundamental theorem of arithmetic.
⊗3↓_∞ → _↓⊗*
84: Lest you think that AM was always this brilliant, let's continue
and look at one of the next things it did. It examined the inverseof
Addition. For example, Add↑-↑1(6) contains the bag (1 1 2 2), since
1+1+2+2 equals 6. One of many conjectures AM notices then is that
Add↑-↑1 of a number often contains a pair of primes. A new relation
is created, which takes a number and returns a list of all possible
ways of representing it as the sum of two primes.
⊗3↓_∞ → _↓⊗*
106: Eventually, AM studies this, and notices that all even numbers
seem to have at least one such representation. This is Goldbach's
conjecture. AM doesn't rate this conjecture very highly, since
primes are not intimately connected with addition.
107: Here, AM isolates those number which can be represented as the
sum of 2 primes in only one way. While this is a novel set of numbers
to investigate, AM wasn't able to find anything interesting out about
them.
⊗3↓_∞ → _↓⊗*
.END
.S(Results)
Now that we've seen a few examples of AM in action, let's step back
and look at the whole run, of which these were just excerpts.
This run occurred a few months ago. AM started with the 115
prenumerical concepts I showed you earlier, with about 8 facets
filled in for each: typically, the name, definition, worth,
domain/range, algorithm, fillin heuristics, checking heurisics, and a
pointer to a known generalization.
AM worked completely by itself for 1 cpu hour, when it ran out of
space and died. AM had filled in a total of 200 previously-blank
facets on these primitive concepts, and it had created about 135 new
concepts. The new concepts ended up with an average of 10 facets
filled in. Half of those new ones were not really worth defining,
and were technically called losers, but the rest were interesting.
<SLIDE: Winners>. Note AM went through the whole chain of discoveries
I talked about during this talk, all the way from set-theoretic
notions to numbers to unique factorization. Several new concepts of
questionable value were derived.
AM did this all on its own; if I sit at the keyboard and guide it
along, the same discoveries can be made in about a third the cpu
time. <Results again>
200 jobs from the agenda were executed during the cpu hour AM used.
Each job was allocated from 1 to 100 cpu seconds, depending on its
priority value. Most jobs were granted about 30 seconds, but used
under 20.
The 1-hour run ended because of space problems, and because AM
couldn't fill in very good heuristics for the new concepts it
created. As it got further and further from its initial starting
basis, the methods it was able to bring to bear on its new concepts
got relatively weaker and weaker. For example, it was dealing with
primes using set-theoretic heuristics.
To correct this, AM would need a body of good meta-heuristics; that
is, rules for finding and fiilling in new rules. At the moment, it
doesn't have any. Since most heuristics are based on hindsight, I
would guess that most of the meta-heuristics couldn't be applied
until AM has already made most of the possible mistakes it could
make. I'm not sure, and I leave this as an open research problem
<SLIDE: open problem>.
All was not perfect, of course. Several tasks were chosen at bizarre
moments, and many poor tasks were chosen. <SLIDE: Losers again> One
danger that AM is prone to is that of regular or looping behavior.
Any kind of repetition should -- but doesn't -- warn AM to be careful
and look for a unifying generalization of what it's doing, to rise
above the infinite loop it's fallen into. That's how this concept
arose (ComposeoCompose...). Similarly, AM repeated multiplication,
then repeated exponentiation, then repeated hyper-exponentiation,...
Although any specific example I give you might have fixed by adding
one simple rule, the general problem of detecting regularity,
infinite loops, is quite difficult and I leave that as another open
problem.
IF ASKED: A HANDLE ON THE SOLUTION IS:
.ONCE INDENT 5,5,5
By investigating each concept it defined, and basing its
interestingness judgments on those results, AM was able to avoid most
such traps.
.S(Experiments With AM)
One important reason for actually constructing AM was that of
⊗4experimentation⊗*: one can vary the concepts AM starts with, vary
the heuristics available, etc., and study the effects on AM's
behavior. <SLIDE: expts> Several such experiments were performed.
One involved adding a couple dozen new concepts from an entirely new
domain: plane geometry. AM busied itself exploring elementary
geometric concepts, and was almost as productive there as in its
original domain. New concepts were defined, and new conjectures
formulated. For example, AM had the notion of identical equality of
lines and of triangles, and generalized these into similarity,
parallel, congruence etc. It noted many new concepts, like triangles
sharing a common side and a common angle. AM also found a cute
geometric interpretation of Goldbach's conjecture, which it had
proposed earlier:<SLIDE: prime angles> Any angle, from 0 to 180↑o,
can be approximated (to within 1↑o) as the sum of two angles of prime
degree. If our culture and technology were different, this might
have been an important, familiar result. Incidentally, much sharper
results and much more general ones were also obtained.
<SLIDE: expts again> Several of the experiments on AM involved
de-tuning the system, e.g. mutilating the formula which assigned
each task a priority rating. Or: radically altering the initial
Worth numbers of the concepts. Generally, AM's behavior was
degraded, but overall it was much more robust than anticipated.
An interesting result was that AM uses precise numeric values very
rarely, only in very blind situations, where no decent reasons exist.
When AM enters a chain of discoveries, it's sustained by masses of
reasons, not by nuances in their numeric ratings.
Another unexpected result of this experiment was that the top few
jobs on the agenda almost always suggest new high-priority jobs, so
that a job sitting in tenth place on the agenda at some moment has a
very low chance of ⊗4ever⊗* being chosen, unless of course some new
reasons for it emerge. So even if you had a kilo-processor machine,
executing all the tasks on the agenda simultaneously, it wouldn't
really speed up the rate at which AM makes great discoveries more
than by a single factor of ten. Not only wouldn't it help, it would
reduce the user's opinion of AM's rationality: a lot of that stems
from AM's choice of which task is best at each moment.
Let me move on to another kind of experiment that was performed. It
was expected that a few key concepts (e.g. Equality) and heuristics
(e.g., define "f" as repeated "g"-ing, where g is an interesting
operation) would have a tremendous impact on AM's behavior: that
removing them would destroy most of AM's discoveries. This was
partly true, but a surprising effect was noticed when these
experiments were actually done last week. Many of the very common and
valuable concepts manage to be rediscovered in many separate ways, so
that the absence of any single concept or heuristic isn't critical.
Multiplication arises in four separate ways, so one would have to
remove many concepts to prevent its being noticed.
Several more experiments have been planned for the near future. This
summer, I intend to add about twenty new concepts, dealing with proof
and proof techniques, to see how the reasons for conjecturing a
theorem might aid in its subsequent proof.
To me personally, this whole businss of experimenting with AM has
been the most exciting aspect of the project. It's like using AM as
a tool, to explore the process of discovery -- at least discovery in
very elementary mathematics.
.S(Maximally-Divisible Numbers)
While working on its own, AM has not discovered any mathematics which
is new and publishable, although it has done so when working as a
co-researcher with a human. AM noticed simple new concepts which most
humans had overlooked, but AM by itself was not able to precisely
formulate interesting statements about those concepts.
Besides the cute Goldbach interpretation just described, AM motivated
the development of a new result in number theory. I'll close this
slide show with one slide about it. <SLIDE: AM conjec>
Just as AM decided to look at numbers with extremely ⊗4few⊗*
divisors, it thought to look at those with abnormally ⊗4many⊗*
divisors. For any number d, the smallest number with at least d
divisors is at least this big. If its's factored into primes, then
their exponents decrease inversely as the logs of the primes. We can
substitue in an estimate for k in terms of d, and the higher one can
prove β (>2) is, the better this result.
For example, here's the smallest number with 6 million divisors, and
in fact the exponents of its prime factors satisfy this constraint
fairly closely, and the AM conjecture predicted that this number n
would be this big, so it was off by less than a factor of two, so
it's a very sharp bound. Erdos recently showed me the best
previously-known formula bounding n(d), and it could only gurantee
that n would be bigger than this, so it was off by more than a whole
order of magnitude.
AM made this definition, found examples of such highly-composite
numbers, and then suggested something like this relationship. Later,
by hand, I was able to derive these formulae, and with Erdos' help
this one.
A couple months ago, I found out that Ramanujan defined these same
numbers and found a similar regularity to this one. As far as is
known, this particular formula is new, but of no tremendous interest
to anyone.
It was AM which expected that this kind of maximally-divisible number
would be very interesting. I thought I knew better, but in this case
AM really was right.
In dozens of other instances, AM seemed to me to be going nowhere and
in reality it was actually going nowhere. It outsmarted me in this
cute way only twice.
.S(Conclusions)
What is the final conclusion of all this?
AM is forced to judge ⊗4a priori⊗* the value of each new concept; it
⊗4must⊗* quickly lose interest in concepts which aren't going to
develop into anything. Often, such judgments can only be based on
hindsight. For similar reasons, AM has difficulty formulating new
heuristics which are relevant to the new concepts it creates. It
needed good meta-heuristics, and I don't even know whether any exist
for this domain.
While AM's "approach" to empirical research may be used in other very
hard scientific domains, the main limitation (reliance on hindsight)
will probably recur. This prevents AM -- and future similar systems
-- from progressing indefinitely far on their own.
Before this ultimate limitation was reached, AM had in fact carried
out some math research. The research was of a very elementary
character, and the same program may not be able to carry out much
more sophisticated theory formation, but nevertheless we can conclude
that certain kinds of hack discoveries, certain routine research
tasks, can be represented adequately as a rule-guided heuristic
search.
.SKIP TO COLUMN 1
*********************************************************************
Ideally, one could run AM overnight, and wake up the next morning to
find five or ten more or less interesting new concepts to look at.
It would still be the researcher who made the highest-level
selections of what to do next, much like a professor directing
graduate students. This is the ultimate use that I hope such systems
will be put to -- probably in the next centry: to work synergetically
alongside a researcher, to free him from doing all the routine
detective work.
.S(Supplement)
↓_ Finding canonizing function for numbers_↓
Experiment number 1 was to convert the arguments from one type of
structure to another, and see if that changed the value of
SAME-LENGTH when it was applied to them. It turned out not to, which
meant that one single canonical type of structure could be chosen.
But there are 4 kinds it could be: a set, a list, a bag, or an
ordered set. The particular kind of structure depends on 2 things:
whether altering the order of the elements in the arguments makes any
difference to SAME-LENGTH, and whether adding multiple occurrences of
the same element makes any difference to SAME-LENGTH.
But changing the order makes no difference, so the canonical
structure type must be unordered, either BAG or SET. But adding
multiple elements ⊗4does⊗* make a difference, so the type must be BAG
or LIST. The only possible canonical type is therefore BAG.
Next, AM notices a message in the Ties facet of SAME-LENGTH, that
explicitly says that it's the same as Equality, except that
SAME-LENGTH does not recurse in the CAR direction. The CAR of an atom
is its value cell, so that message is enough evidence to warrant
testing whether or not the value cells of any of these elements are
ever accessed. Empirically, they never are. So AM assumes it can
replace them all by a single, fixed value: say "T".
Putting this all together, we see the canonizing transformation as:
convert the structure to a BAG, and substitute T for each element.
**********************************************************************
Another thought I hope you'll leave with today is that (for this task
at least) if the heuristic operators are sophisticated enough, then a
simple numeric calculus is all the meta-level heuristics you need for
adequate performance. And you don't need any explicit
meta-meta-heuristics at all. It would be interesting to see if this
is really true for other tasks as well. I suspect tht this is true
whenever the heuristics are fairly independent of each other.
One problem we all face is how to cope with changes in a system's
knowledge base. The standard solution is to track down all the
effects of each change, and update the whole data base. The solution
AM uses, and perhaps people too, is to just record the changed
information, and ⊗4not⊗* to update everything. When a contradiction
of some kind occurs, then the conflicting knowledge is checked, the
knowledge ⊗4it⊗* was based on is checked, and so on, until we
encounter some changed opinion which explains the disagreement. So
the system is allowed to hold contradictory viewpoints, so long as it
could resolve any dispute if it had to. This is one solution you
might keep in mind for your knowledge-based systems. I think that
⊗4this⊗* works because there are so few contradictions; that in turn
is due to the independence of the resoning involved.
.S(Questions and Answers)
.BEGIN NOFILL
(1) Contribution to knowledge? Using the historical paradigm of
heuristic search to automate some of the activities involved in
developing new math theories: directing attention to plausible
concepts to investigate, finding the desired empirical data, inducing
new concepts to study based on the results of previous
investigations.
(2) What exactly are the primitives of AM's behaviors? AM can define
new concepts by joining or modifying defns of existing concepts. AM
can use the heuristic rules it possesses to fill in new entries for
concepts. AM can itself suggest new tasks for its own agenda. AM
can modify facets (e.g., Worth ratings), print messages, prune away
losers, etc. This is such a universal set of behaviors that,
unconstrained, it has little to do with math research. The
plausibility constraints (heurs) give it its character.
(3) What is another use for the heuristic rule which says...
Look at f↑-↑1(b):
groups with very few subgroups;
maximally-divisibles;
f=divisors-of, b=tripletons: result is a kind of squares;
Generalize f if very few examples:
Congruence→Similarity;
Reverse-all-levels→Reverse-top-level
Perfects→Multiply-perfects
(4) Explain the intu's, their failure, etc. These were opaque
(uninspectable by AM) functions which were simulations of real-world
scenarios: seesaws, elevators, archery, etc. AM was supposed to be
able to analogize between various concepts and certain intu's (e.g.,
set(ops) and Venn diagrams). Unfortunately, this contained too much
pre-programmed help (e.g., all the Venn relationships are true, and
they are too easy to get that way; OR: See-saws are anti-symmetric,
and that relation is too easy to get from Seesaws). It wasn't fair:
no relations/connepts were disccvered which the user hadn't forseen
at the time the intutions were coded. So they were all excised, and
not used to make any of the discovereis mentioned herein.
(5) 4 ways to get Multiplications:
Repeated +
Analogue to cross-product: Count(AxB)= [Count(A) ? Count(B)]
Power sets of union: Count(2↑[A∪B]) = [Count(2↑A) ? Count(2↑B)]
Subst A for each element of B, then apply Union to the result.
(6) What does "A. M." stand for? SAM slide: Jim Guard and Eastman.
(7) Uses for AM:
The heuristics themselves:
Everything that AM does can be viewed as testing the underlying body
of heuristics. If AM ever succeeds in a big way, then it might be worthwhile
teaching these heuristics explicitly to math students, just as it might
be a good idea for medical students to learn MYCIN-like rules.
AM itself: get people interested in math, give them a feel for research
Existence of AM: feasibility of automating this kind of process using Heur search
Actually constructing a computer model of this activity has provided
an experimental vehicle for studying the dynamics of plausible
empirical inference.
.END
Suggestions from EAF:
Don't try to defend AM as the right way to automate math research;
rather, defend it as a continuation of a historical line of
application programs, using heuristic search to automate various
aspects of research in assorted sciences. There are some new wrinkles
(agenda, large body of heuristic plausible move generators, allowing
the system to define new concepts and explore them, to give up
whenever it wanted to,etc). Defend on the basis of merely ↓_A_↓
contribution to knowledge, not THE answer to everything.
Emphasize that heuristic search will cause AM to explore a few narrow
ribbons of chanins of discoveries. Vast amounts of valuable concepts
will be missed in this way, but it is ONE way to beat the combin.
explosion. Even so, we saw a minor explosion of red heuristic
arrows! Alternate schemes (e.g., mutate and select) might work, but
they are different:neither better nor worse.